Goto

Collaborating Authors

 backpropogation-free multi-modal on-device model adaptation


Backpropogation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration

arXiv.org Artificial Intelligence

These devices serve as data collection powerhouses, continuously amassing vast repositories of personalized multi-modal data, which can include a wide array of input modalities such as text, images and videos. The potential locked within this trove of multi-modal data arriving continuously is immense, promising to unlock high-quality and tailored device-aware services for individual users. Despite promising, the personalized device service involves analyzing the dynamic nature of the multi-modal data that underscore users' intentions. The prevailing artificial intelligence (AI) systems, primarily trained and deployed in cloud-based environments, face a profound challenge in adapting to the dynamic device data when using a static cloud model for all individual users, mainly due to the distribution shift of the cloud and device data, as shown in Figure 1. In other words, high-quality personalized service requires AI systems to undergo continual refinement and adaptation to accommodate the evolving landscape of personalized multi-modal data. Intuitively, one of the straightforward adaptation strategies is to fine-tune the cloud model based on the device's multi-modal data, which can kindly alleviate the cloud-device data distribution shift to model users' intentions. Nevertheless, we contend that the fine-tuning-adaptation (FTA) paradigm may not satisfactorily resolve device model personalization, which can be summarized as two key aspects: (1) Undesirable Annotation.